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Abstract 



We consider the problem of computing the fc-sparse approximation to the discrete Fourier transform of an n- 
dimensional signal. We show: 

• An 0(k log n)-time randomized algorithm for the case where the input signal has at most k non-zero Fourier 
coefficients, and 

• An 0(k log n log(?i/fc))-time randomized algorithm for general input signals. 

Both algorithms achieve o(n log n) time, and thus improve over the Fast Fourier Transform, for any k = o(n). 
They are the first known algorithms that satisfy this property. Also, if one assumes that the Fast Fourier Transform 
is optimal, the algorithm for the exactly fc-sparse case is optimal for any k = n™ 1 ' . 

We complement our algorithmic results by showing that any algorithm for computing the sparse Fourier trans- 
form of a general signal must use at least Q(k log(n/fc) / log log n) signal samples, even if it is allowed to perform 
adaptive sampling. 

1 Introduction 

The discrete Fourier transform (DFT) is one of the most important and widely used computational tasks. Its 
applications are broad and include signal processing, communications, and audio/image/video compression. 
Hence, fast algorithms for DFT are highly valuable. Currently, the fastest such algorithm is the Fast Fourier 
Transform (FFT), which computes the DFT of an n-dimensional signal in 0(n log n) time. The existence of 
DFT algorithms faster than FFT is one of the central questions in the theory of algorithms. 

A general algorithm for computing the exact DFT must take time at least proportional to its output size, 
i.e., fl(n). In many applications, however, most of the Fourier coefficients of a signal are small or equal 
to zero, i.e., the output of the DFT is (approximately) sparse. This is the case for video signals, where a 
typical 8x8 block in a video frame has on average 7 non-negligible frequency coefficients (i.e., 89% of the 
coefficients are negligible) ICGX96I . Images and audio data are equally sparse. This sparsity provides the 
rationale underlying compression schemes such as MPEG and JPEG. Other sparse signals appear in com- 
putational learning theory 1KM91I ILMN93I . analysis of Boolean functions 0KKL88I |Q'D081 . compressed 
sensing lDon06l ICRT06], multi-scale analysis IDRZ071 , similarity search in databases BAFS93I . spectrum 
sensing for wideband channels HLVS1 II . and datacenter monitoring IMNLIOL 

For sparse signals, the D,(n) lower bound for the complexity of DFT no longer applies. If a signal has 
a small number k of non-zero Fourier coefficients - the exactly k-sparse case - the output of the Fourier 
transform can be represented succinctly using only k coefficients. Hence, for such signals, one may hope for 
a DFT algorithm whose runtime is sublinear in the signal size, n. Even for a general n-dimensional signal x 
- the general case - one can find an algorithm that computes the best k-sparse approximation of its Fourier 
transform, x, in sublinear time. The goal of such an algorithm is to compute an approximation vector x' that 
satisfies the following 1% /I2 guarantee: 



\\x - x'\\ 2 < C min ||J-y|| 2 , 



fc-sparse y 



(i) 
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where C is some approximation factor and the minimization is over fc-sparse signals. We allow the algorithm 
to be randomized, and only succeed with constant (say, 2/3) probability. 

The past two decades have witnessed significant advances in sublinear sparse Fourier algorithms. The first 
such algorithm (for the Hadamard transform) appeared in |KM9 1 1 (building on |GL89|). Since then, several 
sublinear sparse Fourier algorithms for complex inputs have been discovered ||Man92| |GGI + 02| IAGS03I 
IGMS05I HweTOl I Aka 1 01 IHIKP 1 2bl . These algorithms providtfl the guarantee in Equation 

The main value of these algorithms is that they outperform FFT's runtime for sparse signals. For very 
sparse signals, the fastest algorithm is due to 1GMS051 and has <3(fclog c (n) log(n/fc)) runtime, for somqj 
c > 2. This algorithm outperforms FFT for any fc smaller than 0(n/log a n) for some a > 1. For less 
sparse signals, the fastest algorithm is due to IHIKP 12b| , and has 0{yfnk log 3 / 2 n) runtime. This algorithm 
outperforms FFT for any fc smaller than Q(n/ log n). 

Despite impressive progress on sparse DFT, the state of the art suffers from two main limitations: 

1. None of the existing algorithms improves over FFT's runtime for the whole range of sparse signals, i.e., 
k = o(n). 

2. Most of the aforementioned algorithms are quite complex, and suffer from large "big-Oh" constants (the 
algorithm of [HIKP12b| is an exception, but has a running time that is polynomial in n). 

Results. In this paper, we address these limitations by presenting two new algorithms for the sparse Fourier 
transform. We require that the length n of the input signal is a power of 2. We show: 

• An 0(k log n)-time algorithm for the exactly fc-sparse case, and 

• An 0(k log n log(n/fc))-time algorithm for the general case. 

The key property of both algorithms is their ability to achieve o(n log n) time, and thus improve over the 
FFT, for any k = o(n). These algorithms are the first known algorithms that satisfy this property. Moreover, 
if one assume that FFT is optimal and hence the DFT cannot be computed in less than 0(n log n) time, the 
algorithm for the exactly fc-sparse case is optima^ as long as k = nf 1 ^. Under the same assumption, the 
result for the general case is at most one log log n factor away from the optimal runtime for the case of "large" 
sparsity k = n/ log ^ n. 

Furthermore, our algorithm for the exactly sparse case (depicted as Algorithm l3.1l on page 5) is quite sim- 
ple and has low big-Oh constants. In particular, our preliminary implementation of a variant of this algorithm 
is faster than FFTW, a highly efficient implementation of the FFT, for n = 2 22 and fc < 2 17 [jHIKP12al . In 
contrast, for the same signal size, prior algorithms were faster than FFTW only for k < 2000 MHIKP12bl FI 

We complement our algorithmic results by showing that any algorithm that works for the general case 
must use at least Q(fc log(n/fc) / log log n) samples from x. The lower bound uses techniques from OPWl II . 
which shows a lower bound of Vt(k log(n/fc)) for the number of arbitrary linear measurements needed to 
compute the fc-sparse approximation of an n-dimensional vector x. In comparison to HPW1 II . our bound is 
slightly worse but it holds even for adaptive sampling, where the algorithm selects the samples based on the 
values of the previously sampled coordinates]^ Note that our algorithms are non-adaptive, and thus limited 
by the more stringent lower bound of IPW1 II . 

'The algorithm of |Man92], as stated in the paper, addresses only the exactly fc-sparse case. However, it can be extended to the 
general case using relatively standard techniques. 

2 All of the above algorithms, as well as the algorithms in this paper, need to make some assumption about the precision of the input; 
otherwise, the right-hand-side of the expression in Equation (TJ contains an additional additive term. See Preliminaries for more details. 

3 The paper does not estimate the exact value of c. We estimate that c 3. 

4 One also needs to assume that fc divides n. See Section|3]for more details. 

5 Note that both numbers (fc < 2 1 " and fc < 2000) are for the exactly k-sparse case. The algorithm in |HIKP12b| can deal with the 
general case, but the empirical runtimes are higher. 

6 Note that if we allow arbitrary adaptive linear measurements of a vector then its fc-sparse approximation can be computed using 
only 0(fc log log(n/fc)) samples IIPW1 11 . Therefore, our lower bound holds only where the measurements, although adaptive, are 
limited to those induced by the Fourier matrix. This is the case when we want to compute a sparse approximation to x from samples of 
x. 
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Techniques - overview. We start with an overview of the techniques used in prior works. At a high level, 
sparse Fourier algorithms work by binning the Fourier coefficients into a small number of bins. Since the 
signal is sparse in the frequency domain, each bin is likeljO to have only one large coefficient, which can 
then be located (to find its position) and estimated (to find its value). The binning has to be done in sublinear 
time, and thus these algorithms bin the Fourier coefficients using an n-dimensional filter vector G that is 
concentrated both in time and frequency. That is, G is zero except at a small number of time coordinates, 
and its Fourier transform G is negligible except at a small fraction (about 1/fc) of the frequency coordinates, 
representing the filter's "pass" region. Each bin essentially receives only the frequencies in a narrow range 
corresponding to the pass region of the (shifted) filter G, and the pass regions corresponding to different bins 
are disjoint. In this paper, we use filters introduced in |HIKP12bl . Those filters (defined in more detail in 
Preliminaries) have the property that the value of G is "large" over a constant fraction of the pass region, 
referred to as the "super-pass" region. We say that a coefficient is "isolated" if it falls into a filter's super-pass 
region and no other coefficient falls into filter's pass region. Since the super-pass region of our filters is a 
constant fraction of the pass region, the probability of isolating a coefficient is constant. 

To achieve the stated running times, we need a fast method for locating and estimating isolated coef- 
ficients. Further, our algorithm is iterative, so we also need a fast method for updating the signal so that 
identified coefficients are not considered in future iterations. Below, we describe these methods in more 
detail. 

New techniques - location and estimation. Our location and estimation methods depends on whether we 
handle the exactly sparse case or the general case. In the exactly sparse case, we show how to estimate the 
position of an isolated Fourier coefficient using only two samples of the filtered signal. Specifically, we show 
that the phase difference between the two samples is linear in the index of the coefficient, and hence we 
can recover the index by estimating the phases. This approach is inspired by the frequency offset estimation 
in orthogonal frequency division multiplexing (OFDM), which is the modulation method used in modern 
wireless technologies (see IIHT011 . Chapter 2). 

In order to design an algorithm^] for the general case, we employ a different approach. Specifically, 
we can use two samples to estimate (with constant probability) individual bits of the index of an isolated 
coefficient. Similar approaches have been employed in prior work. However, in those papers, the index was 
recovered bit by bit, and one needed f2(log log n) samples per bit to recover all bits correctly with constant 
probability. In contrast, in this paper we recover the index one block of bits at a time, where each block 
consists of O (log log n) bits. This approach is inspired by the fast sparse recovery algorithm of IGLPS10I1 . 
Applying this idea in our context, however, requires new techniques. The reason is that, unlike in IG LPSlOll . 
we do not have the freedom of using arbitrary "linear measurements" of the vector x, and we can only use 
the measurements induced by the Fourier transform]^ As a result, the extension from "bit recovery" to "block 
recovery" is the most technically involved part of the algorithm. Section l4~Tl contains further intuition on this 
part. 

New techniques - updating the signal. The aforementioned techniques recover the position and the value 
of any isolated coefficient. However, during each filtering step, each coefficient becomes isolated only with 
constant probability. Therefore, the filtering process needs to be repeated to ensure that each coefficient is 
correctly identified. In !HIKP12bL the algorithm simply performs the filtering O(logn) times and uses the 
median estimator to identify each coefficient with high probability. This, however, would lead to a running 
time of 0(k log 2 n) in the fc-sparse case, since each filtering step takes k log n time. 

One could reduce the filtering time by subtracting the identified coefficients from the signal. In this 
way, the number of non-zero coefficients would be reduced by a constant factor after each iteration, so the 

7 One can randomize the positions of the frequencies by sampling the signal in time domain appropriately (pGI+02 GMS05 |. See 
Preliminaries for the description. 

8 We note that although the two-sample approach employed in our algorithm works in theory only for the exactly fc-sparse case, our 
preliminary experiments show that using a few more samples to estimate the phase works surprisingly well even for general signals, 
'in particular, the method of |GLPS10] uses measurements corresponding to a random error correcting code. 
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cost of the first iteration would dominate the total running time. Unfortunately, subtracting the recovered 
coefficients from the signal is a computationally costly operation, corresponding to a so-called non-uniform 
DFT (see 1GST081 for details). Its cost would override any potential savings. 

In this paper, we introduce a different approach: instead of subtracting the identified coefficients from 
the signal, we subtract them directly from the bins obtained by filtering the signal. The latter operation can 
be done in time linear in the number of subtracted coefficients, since each of them "falls" into only one bin. 
Hence, the computational costs of each iteration can be decomposed into two terms, corresponding to filtering 
the original signal and subtracting the coefficients. For the exactly sparse case these terms are as follows: 

• The cost of filtering the original signal is 0(B logn), where B is the number of bins. B is set to O(k'), 
where k' is the the number of yet-unidentified coefficients. Thus, initially B is equal to O(k), but its value 
decreases by a constant factor after each iteration. 

• The cost of subtracting the identified coefficients from the bins is 0(k). 

Since the number of iterations is 0(log k), and the cost of filtering is dominated by the first iteration, the total 
running time is 0(k log n) for the exactly sparse case. 

For the general case, we need to set k' and B more carefully to obtain the desired running time. The 
cost of each iterative step is multiplied by the number of filtering steps needed to compute the location of 
the coefficients, which is 0(log(n/£?)). If we set B = Q(k'), this would be O(logn) in most iterations, 
giving a 0(fclog 2 n) running time. This is too slow when k is close to n. We avoid this by decreasing B 
more slowly and k' more quickly. In the r-th iteration, we set B = fc/poly(r). This allows the total number 
of bins to remain O(k) while keeping log(n/i?) small — at most O(loglogfc) more than log(n/fc). Then, 
by having k' decrease according to k' = k/r e ( r > rather than k/2 B ^ r \ we decrease the number of rounds 
to 0(logfc/loglogfc). Some careful analysis shows that this counteracts the log log k loss in the log(n/B) 
term, achieving the desired 0(k log n log(n/fc)) running time. 

Organization of the paper. In Section^ we give notation and definitions used throughout the paper. Sec- 
tions [3] and H] give our algorithm in the exactly /c-sparse and the general case, respectively. Section [5] gives 
the reduction to the exactly fc-sparse case from a fc-dimensional DFT. Section [6] gives the sample complex- 
ity lower bound for the general case. Section [7] describes how to efficiently implement our filters. Finally, 
Section[8]discusses open problems arising from this work. 

2 Preliminaries 

This section introduces the notation, assumptions, and definitions used in the rest of this paper. 

Notation. We use [n] to denote the set {1, ... , n}, and define to = e ~ 27ri / n to be an nth root of unity. For 
any complex number a, we use cj)(a) G [0, 2ir] to denote the phase of a. For a complex number a and a real 
positive number b, the expression a ± b denotes a complex number a' such that \a — a'\ < b. For a vector 
x G C n , its support is denoted by supp(x) C [n]. We use ||a;|| to denote |supp(a;)|, the number of non-zero 
coordinates of x. Its Fourier spectrum is denoted by x, with 



For a vector of length n, indices should be interpreted modulo i. This allows us to define 

convolution 



je[n] 

and the coordinate-wise product (x ■ y)i = Xiiji, so 5T~y = x*y. 

When i G Z is an index into an n-dimensional vector, sometimes we use |z| to denote min^ ( mod „) \j\. 
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Definitions. The paper uses two tools introduced in previous papers: (pseudorandom) spectrum permuta- 
tion llGGI+02l IGMS05I IGST08I and flat filtering windows lHIKP12bl . 

Definition 2.1. Suppose a -1 exists mod n. We define the permutation P a ,a,b by 

(Pa;a,bX)i = x a (i_ a - ) LL> crbl . 

We also define Ha,b{i) = &(i — b) mod n. 

Claim 2.2. P^x^^ =x l uj aa \ 

Proof. 



^ irrfi — a) 

"<j(j-a) 



in 



□ 

Definition 2.3. Wfc say f/jaf (G, G') = (GB,s, a , G' B,8,a) G K" x « a flat window function vv/f/z param- 
eters B > 1, 6 > 0, ami a > |f|supp(G)| = log(n/<5)) and G' satisfies 

• G'i = 1/or |i| < (1 - a)n/(2B) 

• G~'i = Ofor \i\ > n/(2B) 

• Gi G [0, 1] for alii 



G'-G 



<<y. 



oo 



The above notion corresponds to the (1/(2.8), (1 — a)/(2B), 5, 0(B / alog(n/ 5))-fi&t window function 
in BHIKP12bl . In Section [7] we give efficient constructions of such window functions, where G can be 
computed in 0(— log(n/<5)) time and for each i, G'i can be computed in 0(log(n/S)) time. Of course, for 
% £ [(1 - a)n/(2B), n/(2B)], G'i G {0, 1} can be computed in 0(1) time. 

The fact that G'i takes w(l) time to compute for i G [(1 — a)n/ (25), n/(2S)] will add some complexity 
to our algorithm and analysis. We will need to ensure that we rarely need to compute such values. A practical 
implementation might find it more convenient to precompute the window functions in a preprocessing stage, 
rather than compute them on the fly. 

We use the following lemma from [HIKP12b|: 

Lemma 2.4 (Lemma 3.6 of [HIKP12b]). If j ^ 0, n is a power of two, and a is a uniformly random odd 
number in [n], then Pv[o~j £ [— G, C] (mod n)] < AC/n. 
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Assumptions. Through the paper, we require that n, the dimension of all vectors, is an integer power of 2. 
We also make the following assumptions about the precision of the vectors x: 

• For the exactly fc-sparse case, we assume that x.i G {— L, . . . ,L} for some precision parameter L. To 
simplify the bounds, we assume that L = n°^; otherwise the logn term in the running time bound is 
replaced by log L. 

• For the general case, we only achieve Equation if ||x|| 2 < n°( x ' ■ nrnifc. sparse y \\x — y\\ 2 . In gen- 
eral, for any parameter 5 > we can add 8 \ \x\\ 2 to the right hand side of Equation (Q]i and run in time 

0{k\og{n/k)\og{n/5)). 

3 Algorithm for the exactly sparse case 

In this section we assume Xi G {— L, . . . , L}, where L < n c for some constant c > 0, and x is fc-sparse. 
We choose 5 = l/(4n 2 L). The algorithm (NOISELESSSPARSEFFT) is described as Algorithm [XT] The 
algorithm has three functions: 

• HashToBins. This permutes the spectrum of x — z with P a . a .b, then "hashes" to B bins. The guarantee 
will be described in Lemma |331 

• NoiSELESsSPARSEFFTlNNER. Given time-domain access to x and a sparse vector z such that x — z is 
fc'-sparse, this function finds "most" of x — z. 

• NoiselessSparseFFT. This iterates NoiselessSparseFFTInner until it finds x exactly. 

We analyze the algorithm "bottom-up", starting from the lower-level procedures. 

Analysis of NoiselessSparseFFTInner and HashToBins. For any execution of NoiselessS- 
parseFFTInner, define the support S = supp(x — z). Recall that x a ,b(i) = cr (* — b) mod n. Define 
/io-,b(i) = round(7r CT .b(i)_B/n) and 0^(1) = n a .b{i) — h a ,b{i)n/ B. Note that therefore lo^f?)! < n/(2B). 
We will refer to h a ^(i) as the "bin" that the frequency i is mapped into, and o a ^(i) as the "offset". For any 
i £ S define two types of events associated with i and S and defined over the probability space induced by a 
and b: 

• "Collision" event E co a(i): holds iff h a ,b(i) e h a ^(S \ {«}), and 

• "Large offset" event E Q ff(i): holds iff |o CT .b(i)| > (1 — a)n/(2B). 

Claim 3.1. Foranyi £ S, the event E co u(i) holds with probability at most 4\S\/ B. 
Proof. Consider distinct i,j G S. By Lemma |2.4| 

Pr[h a . b (i) = h a ^ b (j)} < Pr[7r CT . b (i) - ir a . b (j) mod n £ [-n/B,n/B]] 
= Pi[a(i — j) mod n G [—n/B, n/B]] 
< A/B. 

By a union bound over j G S, Pr{E co u(i)} <4\S\/B. □ 
Claim 3.2. For any i G S, the event E ff(i) holds with probability at most a. 

Proof. Note that o a ^(i) = Tr a ^(i) = a(i — b) (mod n/B). For any odd a and any I G [n/B], we have 
that Pi'b[(7(i — b) = I (mod n/B)] = B/n. Since only an/B offsets o cr .h(i) cause E a ff(i), the claim 
follows. □ 
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procedure HashToBins(x, z, P a , a ,b, B, S, a) 

Compute yjn/B for j € [B], where y = Gb,«,5 ■ (P<r, a ,bx) 

Compute y' jn / B = Vjn/B - {G' B , a ,s * P^,a,bz) jn / B for j € [B] 
return u given by Uj = y 1 ' j n m- 
end procedure 

procedure NoiselessSparseFFTInner(x, k', z, a) 
Let B = k' / f3, for sufficiently small constant (3. 
Let 5 = l/(4n 2 L). 

Choose cr uniformly at random from the set of odd numbers in [n] . 
Choose b uniformly at random from [n]. 
u <- HashToBins(x, z, P a ,o,b, B, S, a), 
u' «- HASHToBiNs(a;, z, P a .i,b, B, 5, a). 

W 4- 0. 

Compute J = { j : \uj\ > 1/2}. 
for j e J do 

a <— Uj/u'y 

i <s— cr -1 (round(0(a) ^-)) mod n. > </>(a) denotes the phase of a. 

v round(2j). 

Wi <— u. 
end for 
return w 
end procedure 

procedure NoiselessSparseFFT(x, k) 

for £ G 0, 1, ... , log k do 

kt 4- k/2\ a t <- 6(2"*). 

Z <- Z + NOISELESSSPARSEFFTlNNER(a;, k t ,z 7 a t ). 
end for 
return z 
end procedure 

Algorithm 3.1: Exact fc-sparse recovery 
Lemma 3.3. Suppose B divides n. The output u of HashToBins satisfies 

LetQ = \{i G supp(z) | E jf(i)}\. The running time of HashToBins is log(n/<5)+||z|| +C \og(n/5)). 
Proof. Define the flat window functions G = Gs,s,a and G' = G'b,6,<x- We have 

y = G ■ Pa,a,bX = G* P a ,a,bX 

y> = G 7 * P^^T- z) + (G-G~>)* P72b~x 

By Claim l2~2l the coordinates of P a , a ,bX and a; have the same magnitudes, just different ordering and phase. 
Therefore 

(G -G~') * P^x 



< 



G-G> 



Pr, 



,bX 



<5\\x\\ 
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and hence 

% = V'jn/B = G '-l( P <T,a,b( X - Z ))jn/B+l ± 6 Pill 

\l\<n/{2B) 

G 'jn/B-^ tb w{Pa,a,b{x-z)) VtrAi) ±5\\x\\i 

W<r, b (i)-jn/B\<n/(2B) 
'><r.b(i)=J 

as desired. 

We can compute HashToBins via the following method: 

1. Compute y with \\y\\ Q = 0(§ \og(n/S)) in 0(§ log(n/<5)) time. 

2. Compute v e C B given by f l = £\ J/i+j'B- 

3. Because B divides n, by the definition of the Fourier transform (see also Claim 3.7 of IHIKP12bll ) we have 
Vjn/B = Vj for all j. Hence we can compute it with a B-dimensional FFT in 0(B log B) time. 

4. For each coordinate i <E supp(z), decrease y^% a b (i) by G'_ QtT b ^z i uj a ' Jl . This takes O(||z1| + £log(n/(5)) 
time, since computing G'_ Q<T b (j) takes 0(log(n/5)) time if E a ff(i) holds and 0(1) otherwise. ^ 

Lemma 3.4. Consider any i G S such that neither E co u(i) nor E a f f(i) holds. Let j ~ ha-^ii). Then 

TL 

round((j)(uj /u'A) — ) = ai (mod n), 

Z7T 

round(uj) =Xi 

and j G J. 

Proof. We know that || sj| x < fc|!^l!oo < kL < nL. Then by Lemma [3~3l and Ernii(i) not holding, 

w, = (x - z) 4 G'- OCT b (i) ± <5nL. 
Because E a ff(i) does not hold, G'_ 0ct = 1, so 

= (x — z) i ± 8nL. (2) 

Similarly, 



u' = - z) ■w'" ± <5nL 



Then because SnL < 1 < 



(x - z), 



the phase is 

(f>(uj) = ± sin-^fei) = ± 2(5ni 
and 0(2$) = -ai^ ± 2<5nL. Thus ^{uj/u'j) = ai^ ± 4<5nL = cri^ ± 1/raby the choice of 5. Therefore 

71 

round(0(uj/?2j) — ) = erz (mod n). 
Also, by Equation (O, round(u :/ ) =Xi— Finally, |round(uj)| = |x; — %\ > 1, so | > 1/2. Thus 

j e J. □ 
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For each invocation of NoiselessSparseFFTInner, let P be the the set of all pairs (i, v) for which 
the command Wi v was executed. Claims 13.11 and 13.21 and Lemma 13.41 together guarantee that for each 
i E S the probability that P does not contain the pair (i, (x — is at most 4\S\/B + a. We complement 
this observation with the following claim. 

Claim 3.5. For any j G J we have j G h a ,b(S). Therefore, \ J\ = \P\ < \S\. 

Proof. Consider any j ^ h^^S). From Equation (l2b in the proof of Lemma l3.4l it follows that \uj \ < SnL < 
1/2. □ 

Lemma 3.6. Consider an execution of NoiselessSparseFFTInner, and let S = supp(x— z). If\S\ < k', 
then 

E[\\x-z-w\\ ] < 8(/3 + a)\S\. 

Proof. Let e denote the number of coordinates i G S for which either E co u(i) or E ff(i) holds. Each such 
coordinate might not appear in P with the correct value, leading to an incorrect value of Wi. In fact, it might 
result in an arbitrary pair (i 1 , v') being added to P, which in turn could lead to an incorrect value of W{i. By 
Claim [3~5l these are the only ways that w can be assigned an incorrect value. Thus we have 

1 1 3? — z — w\\q < 2e. 

Since E[e] < (A\S\/B + a)\S\ < (4/3 + a)\S\, the lemma follows. □ 

Analysis of NOISELESSSPARSEFFT. Consider the tth iteration of the procedure, and define St = supp(ir— 
z) where z denotes the value of the variable at the beginning of loop. Note that |5o| = | supp(ir)| < k. 

We also define an indicator variable I t which is equal to iff |5* |/|5t_i| < 1/8. If It = 1 we say the the 
tth iteration was not successful. Let 7 = 8- 8(/3 + a). From Lemma [3~l6l it follows that Pr[/ t = 1 | |5t_i| < 
k/2 t ~ 1 ] < 7. From Claim [331 it follows that even if the tth iteration is not successful, then |5t|/|5(_i| < 2. 

For any t > 1, define an event E(t) that occurs iff Ei=i J ^ ^ f / 2 - Observe that if none of the events 
E(l)...E(t) holds then \S t \ < fc/2*. 

Lemma 3.7. Let E = E(l) U . . .Li E(X) for A = 1 + logfc Assume that (Aj) 1 / 2 < 1/4. Then Pr[E] < 1/3. 
Proof. Let t' = \t/2\. We have 

Pr[E(t)}< Q7 t '<2V'<(4 7 ) t/2 

Therefore 

Pr[E] < J2 Pr[E(t)] < < 1/4 ■ 4/3 = 1/3. 

□ 

Theorem 3.8. Suppose x is k-sparse with entries from {—L, . . . , L} for some known L = n ^. Then the 
algorithm NOISELESSSPARSEFFT runs in expected O(fclogn) time and returns the correct vector x with 
probability at least 2 /3. 

Proof. The correctness follows from Lemma [3/71 The running time is dominated by 0(log k) executions of 
HashToBins. 

Assuming a correct run, in every round t we have 

\\z\\ < \\x\\ + \S t \ <fc + fc/2 4 <2k. 

Therefore 

E[\{i G supp(z) I E off (i)}\] < a\\z\\ < 2afc, 
so the expected running time of each execution of HashToBins is 0(— log(n/<5) + k + ak \og(n/S)) = 
logn + k + aklogn). Setting a = 9(2 _t / 2 ) and @ = 0(1), the expected running time in round t is 
0{2~ l / 2 k log n + k + 2~ t l 2 k log n). Therefore the total expected running time is 0(k log n). □ 
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4 Algorithm for the general case 

This section shows how to achieve Equation (Q]) for C = 1 + e. Pseudocode is in Algorithm ic ll and !4. 21 

4.1 Intuition 

Let S denote the "heavy" O(kfe) coordinates of x. The overarching algorithm SparseFFT works by first 
"locating" a set L containing most of S, then "estimating" xl to get z. It then repeats on x — z. We will 
show that each heavy coordinate has a large constant probability of both being in L and being estimated well. 
As a result, x — z is probably nearly k / 4-sparse, so we can run the next iteration with k — > fc/4. The later 
iterations then run faster and achieve a higher success probability, so the total running time is dominated by 
the time in the first iteration and the total error probability is bounded by a constant. 

In the rest of this intuition, we will discuss the first iteration of SparseFFT with simplified constants. 
In this iteration, hashes are to B = 0(k/e) bins and, with 3/4 probability, we get z so x — z is nearly 
fc/4-sparse. The actual algorithm will involve a parameter a in each iteration, roughly guaranteeing that 
with 1 — yfa probability, we get z so x — z is nearly -^/afc-sparse; the formal guarantee will be given by 
Lemma [4~8] For this intuition we only consider the first iteration where a is a constant. 

Location. As in the noiseless case, to locate the "heavy" coordinates we consider the "bins" computed 
by HashToBins with P a , a ,b- This roughly corresponds to first permuting the coordinates according to the 
(almost) pairwise independent permutation P a , a .b, partitioning the coordinates into B = 0(k/e) "bins" of 
n/B consecutive indices, and observing the sum of values in each bin. We get that each heavy coordinate 
i has a large constant probability that the following two events occur: no other heavy coordinate lies in the 
same bin, and only a small (i.e., 0(e/k)) fraction of the mass from non-heavy coordinates lies in the same 
bin. For such a "well-hashed" coordinate i, we would like to find its location t = TT a fi(i) = a(i — b) among 
the en/k < njk consecutive values that hash to the same bin. Let 

9* = — (j + ab) (mod27r). (3) 

J n 

so 9* = ^-cri. In the noiseless case, we showed that the difference in phase in the bin using P a ,o,b and 
using P a ,i,b is S* plus a negligible 0(5) term. With noise this may not be true; however, we can say for 
any (3 € [n] that the difference in phase between using P av a,b and P a a +0 b> as a distribution over uniformly 
random a £ [n], is (39* + v with (for example) E[i/ 2 ] = 1/100 (all operations on phases modulo 2tt). We 
can only hope to get a constant number of bits from such a "measurement". So our task is to find r within a 
region Q of size njk using 0(\og(n/k)) "measurements" of this form. 

One method for doing so would be to simply do measurements with random /3 € [n). Then each mea- 
surement lies within 7r/4 of (39* with at least 1 — > 3/4 probability. On the other hand, for j =/= r and 
as a distribution over /?, f3(0* — 0*) is roughly uniformly distributed around the circle. As a result, each mea- 
surement is probably more than 7r/4 away from (39*. Hence 0(\og(n/k)) repetitions suffice to distinguish 
among the n/k possibilities for r. However, while the number of measurements is small, it is not clear how 
to decode in polylog rather than VL{n/k) time. 

To solve this, we instead do a t-ary search on the location for t = O(logn). At each of 0(log t (n/k)) 
levels, we split our current candidate region Q into t consecutive subregions Qi , . . . , Q t , each of size w. Now, 
rather than choosing (3 6 [n], we choose (3 £ [jf^, By the upper bound on (3, for each q e [t] the values 
{(39* | j G Qq] all lie within (3w^- < ir/4 of each other on the circle. On the other hand, if \ j — t\ > 16w, 
then (3(9* — 9*) will still be roughly uniformly distributed about the circle. As a result, we can check a 
single candidate element e q from each subregion: if e q is in the same subregion as r, each measurement 
usually agrees in phase; but if e q is more than 16 subregions away, each measurement usually disagrees in 
phase. Hence with 0(\ogt) measurements, we can locate r to within 0(1) consecutive subregions with 
failure probability l/i e W. The decoding time is 0(t log t). 
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procedure SparseFFT(:e, k, e, 5) 

R <— 0(log k/ log log k) as in Theorem |4.9l 

z« <- 

for r e [R] do 

Choose _B r , fe r , a r as in Theorem l4.9l 
i? es t <- 0(log(^)) as in LemmaEl 

i r «- L0CATESlGNAL(a;,z('' :) ,B I .,a r ,5) 

J<r+1) j<r) +ESTIMATEVALUES(x,2 (r \ 3k r ,L r ,B r ,8,R est ). 
end for 
return 
end procedure 

procedure Estimate Values(ie, z, k', L, B, 5, R est ) 
for r g [i? est ] do 

Choose a.,,, b r G [n] uniformly at random. 

Choose oy uniformly at random from the set of odd numbers in [n] . 

y;(r) UAStlToBWS(x,Z,Pa,a r ,b,B,5). 

end for 

W <- 

for ieldo 

median r u y h b (j\U artT4 - > Separate median in real and imaginary axes. 

end for 

J <- argmaxiJi^, \\wj\\ 2 . 
return wj 
end procedure 

Algorithm 4.1: fc-sparse recovery for general signals, part 1/2. 



This primitive LocateInner lets us narrow down the candidate region for r to a subregion that is 
at' = fl(t) factor smaller. By repeating LocateInner log t , (n/k) times, LocateSignal can find r 
precisely. The number of measurements is then 0(logtlog t (n/fc)) = 0(log(n/fc)) and the decoding time 
is O (t log t\og t (n/k)) = 0(\og(n/k) logn). Furthermore, the "measurements" (which are actually calls to 
HashToBins) are non-adaptive, so we can perform them in parallel for all 0(k/e) bins, with 0(log(n/<5)) 
average time per measurement. This gives 0(k \og(n/k) log(n/<5)) total time for LOCATES IGNAL. 

This lets us locate every heavy and "well-hashed" coordinate with l/t e ^ = o(l) failure probability, so 
every heavy coordinate is located with arbitrarily high constant probability. 

Estimation. By contrast, estimation is fairly simple. As in Algorithm 13. II we can estimate (x — z) i as 
u ht7 where u is the output of HashToBins. Unlike in Algorithm 13.11 we now have noise that may 
cause a single such estimate to be poor even if i is "well-hashed". However, we can show that for a random 
permutation P a , a ,b the estimate is "good" with constant probability. EstimateValues takes the median of 
Rest = 0(log -) such samples, getting a good estimate with 1 — e/64 probability. Given a candidate set L 
of size k/e, with 7/8 probability at most fc/8 of the coordinates are badly estimated. On the other hand, with 
7/8 probability, at least 7fc/8 of the heavy coordinates are both located and well estimated. This suffices to 
show that, with 3/4 probability, the largest k elements J of our estimate w contains good estimates of 3k/ 4 
large coordinates, so x — z — wj is close to fc/4-sparse. 

4.2 Formal definitions 

As in the noiseless case, we define n a = a(i — b) mod n, h a b(i) = round(7r (T ^(i)B/n) and o CT = 
^ry.b{i) — h a j,{i)n/ B. We say h a ^(i) is the "bin" that frequency i is mapped into, and o CT .b(i) is the "offset". 
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prUCclILirc LwLAI JiolLriNAL-^X, 2-, JJ, Ct, 0) 




Choose uniformly at random a, b G [n] with cr odd. 




Initialize = (i — l)n/ b> tor i £ [ir/J. 




Let w = n/B, t = log n, t' = t/4, D max = \og t , (w + 1). 




Let Rioc = V(\og 1/a {t/a)) per Lemma|4.5l 




for D G [D mai ] do 




l^-t-i) ^_ LocateInner(.t, z, B, o, a, a, p, l { wo/(t ) 


, t, Kloc) 


end for 








return L 




end procedure 


> o, a parameters for G, G 


> (ti,n +«;;,.. 


■ ,{Ib,Ib + w) the plausible regions. 
> B « fe/e the number of bins 


> i w log 


• 77 the number of regions to split into. 


> i?( oc « log i = 


log log 77 the number of rounds to run 


proceuure LOLAihiNNbK^x, z, _t>, o, o;, cr, o, t, ttj, t, ±ti oc ) 




Let s = 9(a i/,:i ). 




T et ?; — f) for ( n a\ (= f 7?1 X M 




for r G [-Rjoc] do 




Choose a 6 [n] uniformly at random. 




Choose Be 14^, . . . , 4P^} uniformly at random. 




U HASHToBlNS(x, Z, P a .a.b, B, 6, a). 




U 1 <— HASHToBlNS(x, Z, Pa,a+/3.b, B, S, a). 




for j G [B] do 




c„ ^ — d>(u^ /u' ) 

3 t \ 3 1 j J 




for q £ [i] do 




/ 7 i 9-1/2 

777 A „ 4 — / -4- — ' — 7 / 1 

" l 3,1 ~ L 3 ~ t 






Oi „ i — — mod 2tt 

J>y n 




it vaijx\\puj q — Cj\ , Z7T — \ptfjn — Cj|j < S7T men 








end if 




end for 




end for 




end for 




for j G [-B] do 




Q* <- {c/ G It] \v iQ > Rioc/2} 




if g* ^ then 




^ min g6 g» lj + ^y-w 




else 




i<<-_L 




end if 




end for 




return i' 




end procedure 




Algorithm 4.2: fc-sparse recovery for general si 


gnals, part 2/2. 
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We define h a \{j) = {i G [n] | = j}. 

Define 

Err(x, fc) = min \\x — y\\ 2 ■ 

/c-sparse y 

In each iteration of SparseFFT, define x! = x — % and let 

p 2 = Err 2 (x',fc) + 8 2 n\\x\\ 2 
p 2 = ep 2 /k 

S = {ie[n}\ |^| 2 >M 2 } 

2 



Then \S\ < (1 + l/e)fc = 0(fe/e) and 



< (1 + e)p . We will show that each i £ S is found by 



LOCATESIGNAL with probability 1 - 0(a), when B = fi(^). 

For any i £ S define three types of events associated with i and S and defined over the probability space 
induced by a and b: 



• "Collision" event E co u(i): holds iff h a j,(i) € h a ^(S \ {«}); 

• "Large offset" event E ff(i): holds iff | o cr; b(i) | > (1 — a)n/ (2B); and 

.2/ 



"Large noise" event E no i se (i): holds iff 



> EiT 2 (x',k)/(aB) 



By Claims|3l]and[l2l Pr[E coU (i)} < 4 \S\ /B = 0(a) and Pr[E off (i)} < 2a for any i e S. 
Claim 4.1. For any i e 5, Pr[i?„ olse (i)] < 4a. 

Proo/ For each j ^ i, Pr^^j) = < Pr[|<rj - ai\ < n/B] < 4/ B by Lemma El Then 

2 



X h~l(/»o-,i.(<))\S 



2 






a:'[n]\s 



The result follows by Markov's inequality. 



□ 



We will show for i e S that if none of E co u(i), E a ff(i), and E noise (i) hold then SparseFFTInner 
recovers x\ with 1 — 0(a) probability. 

Lemma 4.2. Let a <= [n] uniformly at random, B divide n, and the other parameters be arbitrary in 

u = HashToBins(:e, z, P a , a ,b, B, 5, a). 
Then for any i € [n] with j = h a> b(i) and none of E co u(i), E ff(i), or E no i se (i) holding, 



E[ 



Uj — x'jUJ 



2 P 2 
1 ~ aB 



Proof Let G' = G' B ,8,c Let T = h~^ b (j) \ {i}. We have that T n S = {} and G'_ 0(j b(l) 
Lemma[3~3l 



1. By 



J^G'-oAi')*'^' ± 5 Plli 



Because the cri' are distinct for i' G T, we have by Parseval's theorem 



i'£T 



= E( G '-°4i')^') 2 < 



i'eT 
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Since [X + Y\ 2 < 2 \X\ 2 + 2 \Y\ 2 for any X, Y, we get 



Uj — x'iUj" 



2 }<2\\x' T \\l + 2S 2 \\xf 1 

< 2Err 2 (x',k)/(aB) + 25 2 \\x\\l 

< 2p 2 /(aB). 



□ 



4.3 Properties of LocateSignal 

In our intuition, we made a claim that if j3 £ [n/ (16k;), n/(8w)] uniformly at random, and i > 16w, then 
— /3« is "roughly uniformly distributed about the circle" and hence not concentrated in any small region. This 
is clear if (3 is chosen as a random real number; it is less clear in our setting where /3 is a random integer in 
this range. We now prove a lemma that formalizes this claim. 

Lemma 4.3. Let T C [m] consist oft consecutive integers, and suppose j3 £ T uniformly at random. Then 
for any i £ [n] and set S C [to] of I consecutive integers, 

PrWi mod n £ S] < \im/n\ (1 + \l/i\)/t <- + — + — + - 

t nt nt it 

Proof. Note that any interval of length I can cover at most 1 + \ elements of any arithmetic sequence of 
common difference i. Then {(3i | /3 e T} C [im] is such a sequence, and there are at most \im/n \ intervals 
an + S overlapping this sequence. Hence at most \im/n \ (1 + \ of the (3 G [m] have (3i mod n E S. 
Hence 

Pr[pi mod n £ S] < \im/n\ (1 + [l/i\)/t. 

□ 

Lemma 4.4. Leti £ S. Suppose none of E co u(i), E Q f f(i), and E no i se (i) hold, and let j — h a ^{i). Consider 
any run of LOCATElNNER with i^cr,b{i) G [Ijj lj + w] ■ Let f > be a parameter such that 

afe 

for C larger than some fixed constant. Then 7t ai b(i) £ [?'•, E + Aw/t\ with probability at least 1 — tf nl - R '° c \ 
Proof. Let r = n a .b{i) = — b) (mod n), and for any j £ [n] define 

0* = —(j + ab) (mod 2tt) 
J n 

so 6* = ^-ai. Let g = 6(/ 1 / 3 ), and C = ^ = 6(1 /g 3 ). 

To get the result, we divide [lj, lj + w] into t "regions", Q q = [lj + ^-r-w, lj + |w] for q £ [t]. We 
will first show that in each round r, Cj is close to (39* with 1 — g probability. This will imply that Q q gets 
a "vote," meaning Vj. q increases, with 1 — g probability for the q' with r £ Q q >. It will also imply that Vj tq 
increases with only g probability when \q — q'\ > 3. Then Ri oc rounds will suffice to separate the two with 
1 - f~ Q ( R ioc) probability. We get that with 1 - £/- n R <°<=) probability, the recovered Q* has [q - q'[ < 3 
for all q £ Q* . If we take the minimum q £ Q* and the next three subregions, we find r to within 4 regions, 
or Aw/t locations, as desired. 

In any round r, define u = and a = a r . We have by Lemma l4~2l and that i £ S that 

\ „P 2 2fc 2 

] < 2— = 

olB Bat 

(JH — Ql 1*1 
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Note that 0(u; a< ") = —aO*. Thus for any p > 0, with probability 1 — p we have 











Uj - LU aal x'i 




/ C'p 


X i 



where ||x — j/||q = min 76 z |x — y + 27T7| denotes the "circular distance" between x and y. The analogous 
fact holds for <j){u'j) relative to 4>(x'i) — (a + 0)6* . Therefore with at least 1 — 2p probability, 



'rll 



O 



< 



< 



u,)-<K<;)-/ 

ftuj) - ^{x'i) - aO*)) - (^(u'j) - {(1,(9 i) -(a + 0)9* T )) 

[u' j) -((/>(? i)- (a + 0)6* T ) 



O 



(«,•) - (4>(x'i) - <) 



O 



o 



by the triangle inequality. Thus for any s = <d(g) andp = 9(g), we can set C = ps i n ^ sv /^ = ©(1/ff 3 ) so 
that 



;il <W2 



(4) 



with probability at least 1 — 2p. 

Equation (0| shows that Cj is a good estimate for i with good probability. We will now show that this 
means the approprate "region" Q q > gets a "vote" with "large" probability. 

For the q' with r S [Zj + ij + i-w], we have that rrij. q i = lj + q ~^ 2 w satisfies 



w 



so 



2/ 

-e jq ,\< 2 ^™. 

3A 1 - n 2i 



Hence by Equation (0|, the triangle inequality, and the choice of B < 

\\cj - pe M \\ Q <\\ Cj - p6* T \\ Q + ||/3<9* - P6 jig ,\\ 

S7T j3lTW 



2 Tit 

S7T S7T 

Y + Y 

S7T. 



Thus, will increase in each round with probability at least 1 — 2p. 

Now, consider q with \q — q'\ > 3. Then |r — rrij, q \ > and (from the definition of f3 > ^) we have 



7sn 3sn 



P\T-mj, q \ > — > 



(5) 



We now consider two cases. First, suppose that |t — rrij t q\ < In this case, from the definition of j3 it 
follows that 

/3\t — mj tq \ < n/2. 
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Together with Equation §5§ this implies 

Pr[/3(r - rrij tq ) mod n e [-3sra/4, 3sn/4]] = 0. 

On the other hand, suppose that |r — m^] > j|. In this case, we use Lemma R31 with parameters 
I = 3sn/2, m = IP^. i = 4^, i = (r — m,- „) and n = n, to conclude that 

^ r^/ x , r „/,„/, n 4w „ It — to, J „ 3sn st Aw 

Pr /3(t - m,- „) mod n € [-3sn/4, 3sn/4 < h 2-1 ^ + 3s H 

snt n 2 w snt 

Aw 2w 

< + — +9s 

snt n 

<4 + 9s 
sB 

< 10s 

where we used that \i\ < w < n/B, the assumption ^ < |£|, i > 1, s < 1, and that s 2 > 6/B (because 
s = G(g) andB = w(l/ 5 3 )). 

Thus in either case, with probability at least 1 — 10s we have 



2irp(mj, q - t) 



2tt 3sn 3 

> 7~ = n Sn 

q 7i 4 2 



for any g with \q — q'\ > 3. Therefore we have 

\\cj - P0 jA \\ o > - W\\ - \\ Cj - f39;\\ Q > stt 

with probability at least 1 — 10s — 2p, and Vj, q is not incremented. 

To summarize: in each round, Vj >q i is incremented with probability at least l — 2p and Vj >q is incremented 
with probability at most 10s + 2p for \q — q'\ > 3. The probabilities corresponding to different rounds are 
independent. 

Set s = gj 20 and p = g/A. Then Vj iq i is incremented with probability at least l—g and Vj, q is incremented 
with probability less than g. Then after i?/ oc rounds, if \q — q'\ > 3, 



Pr[v j>q > Rioc/2] < 
for g = / 1/3 1 A. Similarly, 

Pr[vj, q ' <Rioc/2]<f n{R '^. 

Hence with probability at least 1 — tj^(Rioa) we have q' G Q* and \q - q'\ < 3 for all g G Q*. But then 
t - I'j e [0, 4w/t] as desired. 

BecauseE[|{i E supp(z) | E f f (i)}\] — a II^Hq, the expected running time is 0(Ri oc Bt J rIli oc ^ log(n/5) + 
i?; oc P|| (l + alog(n/5))). □ 

Lemma 4.5. Suppose B = for C larger than some fixed constant. The procedure LOCATESIGNAL 
returns a set L of size \L\ < B such that for any i E S, Pr[i E L] > I — 0(a). Moreover the procedure runs 
in expected time 

0((- log(n/J) + ||z|| (1 + alog(n/6j)) log(n/B)). 
a 

Proof. Consider any i E S such that none of E co u(i),E ff(i), and E no i se (i) hold, as happens with proba- 
bility 1 - O(a). 

Set t = logn,t' = t/A and Ri oc = 0(log x / a (t/otj). Let wq = n/B and wd = u) /(t')' D_1 > so 
toj) Mi+ i < 1 for D max = log t /(w;o + 1) < t. In each round D, Lemma [4~4l implies that if K a ,b{i) € 
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l^jP+wn] then [if +1 \l\ D+1) 



l^-i-^i + U , D+1 ] w ith probability at least 1 - a"^'"' = 1 - a/t. 

(Dm««+1) ,(£> max +l) 



By a union bound, with probability at least 1 — a we have ix a b(i) £ 

{if— + \ Thus i = ^1(1 +1) )£L. 

Since Ri oc D m ax = 0(\og 1/a (t/a) \og t {n/B)) = 0(\og(n/B)), the running time is 



B 



0{D max {R loc - \og(n/S) + Rioc \\z\\ (1 + a log(n/<5)))) 
a 

= 0((- log(n/5) + \\z\\ Q (1 + a log(n/*))) log(»/fl)). 



□ 



4.4 Properties of EstimateValues 

Lemma 4.6. For any i G L, 



Pr[ 



w, - X'i 



> fi 2 ] < e ~ n ^ 



if B > — for some constant C. 



Proof. Define e r = u^'w a, <Ti — z'i in each round r. Suppose none of E^ u (i), E^(i), and E^ ise (i) 
hold, as happens with probability 1 — 0(a). Then by Lemma H~2l 



Ea r [|e,| 2 ] < 2 P 



2^ 2,^2 



Hence with 3/4 - 0(a) > 5/8 probability in total, 

M 2 < < M 2 /2 

for sufficiently large C. Then with probability at least 1 — e.-^ 1 ( R e»t) > both of the following occur: 



median real (e r ) < p 2 /2 

r 

2 

median imag(e r ) < p 2 /2. 



If this is the case, then |median r e r \ 2 < p 2 . Since Wi = x'i + median e r , the result follows. 



□ 



Lemma 4.7. Let R es t > C log -y^ /or some constant C and parameters 7, / > 0. 77ie« ;/ ESTIMATE VAL- 
UES is run wif/i input k! = 3k, it returns wj for \J\ = 3fc satisfying 



Err 2 (x' L - wj, fk) < Err 2 (x^, k) + 0(k^ 2 ) 

with probability at least 1 — 7. 

Proof. By Lemma l4~6l each index i 6 L has 



Prl 



Let U = {i e L I 



m - x'i 



> p 2 }. With probability 1 — 7, \U\ < fk; assume this happens. Then 



{x' - w) L \u 



(6) 
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Let T contain the top 2k coordinates of Wl\u- By the analysis of Count-Sketch (most specifically, Theo- 
rem 3.1 of 1PW1 II ). the loo guarantee (O means that 



x'l\u -Wt < Err 2 (V L \(7, k) + 3k/j 2 



(7) 



Because J is the top 3fc > (2 + f)k coordinates of wi, T C J. Let J' = J \ (T U U), so | J'| < fc. Then 

2 
2 



Err 2 ^ -w},fk)< 



X L\U ~ W J\U 



x'l\(uuj') - wt 



(x' - W)ji 



< 



x'h\U - W T 



+ \J'\ (x'-w)j. 



< Err 2 (P i \ c/ , k) + 3/c/i 2 + /c/i 2 
= Err 2 (P LW ,fc) + (9(fc Ai 2 ) 



where we used Equations © and (0. 



□ 



4.5 Properties of SparseFFT 

We will show that x — z^ r ^ gets sparser as r increases, with only a mild increase in the error. 

Lemma 4.8. Define x^ = x — z^ r \ Consider any one loop r of SparseFFT, running with parameters 
(B, k, a) = {B ri k r , a r ) such that B > ^r^far some C larger than some fixed constant. Then for any / > 0, 

Err 2 (x (r+1) ,/fc) < (l + e)ErT 2 {x {r) 1 k) + 0{e5 2 n\\x\\ 2 1 ) 
with probability 1 — 0{aj /), and the running time is 

O((||zW|| (l + abg(n/<0) + - log(n/*))(log — + log(n/B))). 

a ae 

Proof. We use R est = 0(log = 0(log ^) rounds inside EstimateValues. 
The running time for LocateSignal is 



0((- log(n/*) + H^Hoa + a\og(n/6))) \og(n/B)), 
a 

and for ESTIMATE VALUES is 

0((- log(n/<S) + ||? M ||o(l + alog(n/<5))) log — ) 



for a total running time as given. 



Recall that in round r, fx 2 = f (Err 2 (x {r \ k) + 6 2 n \\x\\l) and S = {i e [n] 



> M 2 }. By 



Lemma POl each i £ S lies in L r with probability at least 1 — 0(a). Hence \S \ L\ < fk with probability at 
least 1 - 0{a/f). Then 



Ew 2 (x^ L ,fk)< 



>]\(LUS) 



<Erv 2 (x^ L ,k) + kp, 2 . 



-(r) 

C [n]\(LUS) 



(8) 
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Let w = z( r+1 ) - z (r) = x^ - a;( r+1 ) by the vector recovered by ESTIMATE VALUES. Then supp(-u)) C L, 



Err 2 (x (r+1) , 2/fc) = Err 2 (x (r) - w, 2fk) 

< Err 2 (i^, fk) + Err 2 (^ r) - w, fk) 

< Err 2 (x^ ] \ i , fk) + Evv 2 {x ( [\k) + 0(kfi 2 ) 
by Lemma |4~7l But by Equation ([SJ, this gives 

Err 2 (^ r+1 \ 2fk) < Err 2 (^ L , k) + Err 2 (x { [ } , k) + 0(kpi 2 ) 

<Err 2 (x (r) ,fc) + 0(/cM 2 ) 

= (1 + 0(e)) Err 2 (£«, k) + 0(eS 2 n \\x\\l). 

The result follows from rescaling / and e by constant factors. 

Given the above, this next proof follows a similar argument to IIIPW1 II . Theorem 3.7. 

Theorem 4.9. With 2/3 probability, SparseFFT recovers z( R+1 ) such that 



□ 



< (1 + e) Err(z, k) + 5 \\x\\ 



in 0{ j log(n/fc) \og(n/8)) time. 

Proof. Define f r = 0(l/r 2 ) so £ U < V 4 - Choose R so Y[ r < R f r < < H r<R fr. Then R = 
0(logfc/loglogfc), since EUfl/r < (fR/2) R/2 = (2/R) R . 

Set e r = f r e, a r = 0(/ 2 ), k r = kJJ i<r f { , B r = 0(ja r f r ). Then B r = uj(-^-), so for sufficiently 
large constant the constraint of Lemma |4~81 is satisfied. For appropriate constants, Lemma l4~8l says that in 
each round r, 

Err 2 (^ +1 \fc r+1 ) = Err 2 (^ +1 ),/ r A; r ) < (1 + f r e) Err 2 (x^ , k r ) + 0(f r eS 2 n \\x\\l) (9) 
with probability at least 1 — f r . The error accumulates, so in round r we have 

ETT 2 (x^\k r )<ETT 2 (x,k)Y[(l + f l e) + Y,0(f r eS 2 7i\\x\\ 2 1 ) J] (1 + f j£ ) 

i<r i<r i<j<r 

with probability at least 1 — J2i< r fi > 3/4- Hence in the end, since fc_R+i — k J\ i<R fi < 1' 



(R+l) 



= EvT 2 (x^ +1 \k R+1 ) < Ett 2 (x, k) n (1 + f i£ ) + 0(ReS 2 n ||£|| 2 ) J] (1 + f t e) 



i<R 



i<R 



with probability at least 3 /4. We also have 

Y[(l + f t e)<e^^<e 

i 

making 

n(l + /^)<l + eX]/i e < 1 + 2e - 

i i 

Thus we get the approximation factor 



x — z 



(R+i) 



< (1 + 2e) Err 2 (x, jfc) + 0((log k)eS 2 n p|| 2 ) 
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with at least 3/4 probability. Rescaling 6 by poly(n), using < n \\x\\ 2 , and taking the square root gives 
the desired 

"x-z( R+ V 2 < (l + e)Err(z,fc)+(J||x|| 2 
Now we analyze the running time. The update round r has support size 3fc r , so in round r 

||? (r) ||o<^3fc r = 0(fc). 

Thus the expected running time in round r is 

0((fc(l + a r \og(n/S)) + ^ log(n/*))(log — + k>g(n/B r ))) 

k k r 2 

= 0((k + — log(n/<5) + —j log(n/J))(log — + log(ne/fc) + logr)) 

fc 

= 0((A; + —5- log(n/<5))(logr + log(n/fc))) 
er z 

We split the terms multiplying k and -\ log(n/<5), and sum over r. First, 

R 

^(logr + log(n/fc)) < R\ogR + Rlog(n/k) 

r=l 

< 0(log fc + log fc log(ra/fc)) 
= 0(logfclog(n/fc)). 



Next, 



Thus the total running time is 



R 

]T ^ (log r + log(n/fc)) = 0(log(n/fc)) 



Oik log k login/k) + - log(n/<5) \og(n/k)) = 0(- \og(n/S) log(n/fe)). 
e e 



□ 



5 Reducing the full &>dimensional DFT to the exact A: -sparse case in n 
dimensions 

In this section we show the following lemma. Assume that k divides n. 

Lemma 5.1. Suppose that there is an algorithm A that, given an n- dimensional vector y such that y is k- 
sparse, computes y in time T(k). Then there is an algorithm A' that given a k-dimensional vector x computes 
x in time 0{T{k))). 

Proof. Given a fc-dimensional vector x, we define yi = Xi mo( j for i = . . . n — 1. Whenever A requests a 
sample y^, we compute it from x in constant time. Moreover, we have that yi = Xi/(n/k) if i is a multiple of 
(n/k), and i/i = otherwise. Thus y is fc-sparse. Since x can be immediately recovered from y, the lemma 
follows. □ 

Corollary 5.2. Assume that the n- dimensional DFT cannot be computed in o(n log n) time. Then any algo- 
rithm for the k-sparse DFT (for vectors of arbitrary dimension) must run in Q(k log k) time. 
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6 Lower Bound 



In this section, we show any algorithm satisfying Equation (Q~|i must access Q(k log(n/fc) / log log n) samples 
of x. 

We translate this problem into the language of compressive sensing: 

Theorem 6.1. Let F G C nx " be orthonormal and satisfy \Fi.j\ = 1/ y/nfor all i, j. Suppose an algorithm 
takes m adaptive samples of Fx and computes x' with 

\\x ~ ^'11 9 < 2 min \\x — x*\\ 2 

k-sparse x* 

for any x, with probability at least 3/4. Then it must have m = tt(k log(n/fc) / log log n). 

Corollary 6.2. Any algorithm computing the approximate Fourier transform must access Q(k log(n jk) / log log 
samples from the time domain. 

If the samples were chosen non-adaptively, we would immediately have m = Vl(k log(n/fc)) by HPW1 1| . 
However, an algorithm could choose samples based on the values of previous samples. In the sparse recovery 
framework allowing general linear measurements, this adaptivity can decrease the number of measurements 
to 0(k log log(n/fc)) BIPW1 II : in this section, we show that adaptivity is much less effective in our setting 
where adaptivity only allows the choice of Fourier coefficients. 

We follow the framework of Section 4 of IIPW1 U . In this section we use standard notation from infor- 
mation theory, including I(x; y) for mutual information, H(x) for discrete entropy, and h(x) for continuous 
entropy. Consult a reference such as IICT9 1 II for details. 

Let F C {S C [n] : |5"| = A;} be a family of fc-sparse supports such that: 

• \S ® S'\ > k for S 7^ S' G F, where © denotes the exclusive difference between two sets, and 

• log 1^1 = Q(klog(n/k)). 

This is possible; for example, a random code on [n/k] k with relative distance 1 /2 has these properties. 

For each S G F, let X s = {x G {0,±1}" | supp(x 5 ) = S}. Let x G X s uniformly at random. The 
variables Xi, i G S, are i.i.d. subgaussian random variables with parameter a 2 = 1, so for any row Fj of F, 
FjX is subgaussian with parameter er 2 = k/n. Therefore 

Pr [\Fjx\ > t^/kJVi] < 2e-' 2/2 
xex s 

hence for each S, we can choose an x G X s with 

Let X = {x s | S G F} be the set of such x s . 

Letw~iV(0,a^/ n )be i.i.d. normal with variance ak/n in each coordinate. 
Consider the following process: 

Procedure. First, Alice chooses S G F uniformly at random, then selects the x G X with supp(a;) = S. 
Alice independently chooses w ~ A^(0, a^I n ) for a parameter a = 0(1) sufficiently small. For j G [m], 
Bob chooses ij G [n] and observes yj = FiAx + w). He then computes the result x' « x of sparse 
recovery, rounds to X by i = argmin x , g j s: \\x* — x'\\ 2 , and sets S' = supp(x). This gives a Markov chain 
S — y x — y y — y x' — y x — y S' . 

We will show that deterministic sparse recovery algorithms require large m to succeed on this input 
distribution x + w with 3 /4 probability. By Yao's minimax principle, this means randomized sparse recovery 
algorithms also require large m to succeed with 3/4 probability. 

Our strategy is to give upper and lower bounds on I(S; S'), the mutual information between S and 5". 
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Lemma 6.3 (Analog of Lemma 4.3 of BPW1 II for e = 0(1)). There exists a constant a 1 > such that if 
a < a', then I(S; S') = Q(klog(n/k)) . 

Proof. Assuming the sparse recovery succeeds (as happens with 3/4 probability), we have || x' — (x + w)\\ 2 < 
2 ||wj| 2 , which implies \\x' — x\\ 2 < 3 ||u>|| 2 . Therefore 

\\x-x\\ 2 < \\x-x'\\ 2 +\\x' -x\\ 2 
<2\\x'-x\\ 2 
<6|H| 2 . 

We also know \\x' — x"\\ 2 > V~k for all distinct x', x" £ X by construction. Because E[||tt»|| 2 ] = ak, with 
probability at least 3/4 we have ||to|| 2 < V4afc < \fkj l 6 for sufficiently small a. But then \\x — x\ 2 < \f~k~, 
so x = x and S = S' . Thus Pr[5 ^ S'] < 1/2. 

Fano's inequality states H(S \S')<1 + Pr[S ^ S'} log | T\. Thus 

7(5; S') = H(S) -H(S\S')>-l + ± log \T\ = log(n/A)) 

as desired. □ 

We next show an analog of their upper bound (Lemma 4.1 of BPW1 11 ) on I(S; S') for adaptive measure- 
ments of bounded £oo norm. The proof follows the lines of OPWl 11 . but is more careful about dependencies 
and needs the bound on Fx. 

Lemma 6.4. 

I(S;S') < 0(mlog(l + - logn)). 

a 

Proof. Let Aj = Fi j for j € [m], and let w'j = AjW. The w^- are independent normal variables with variance 
aK Because the Aj are orthonormal and w is drawn from a rotationally invariant distribution, the w' are also 
independent of x. 

Let yj = AjX + w'j. We know I(S; S') < I(x; y) because S*— ^x— ^y— s-S"isa Markov chain. Because 
the variables Aj are deterministic given y\, ... , j/j-i, 

■ | j/i, . . . , yj-i) = I(x; Ajx + w' j \y 1 ,..., yj-!) 

= h(AjX + w'j | Z/i, - - .,2/j-i) - /i(A/a: + | x,y 1} . . .,yj-i) 
= h(AjX + w^ \yi, .. .,Vj-i) - h(w'j). 

By the chain rule for information, 

I(S;S')<I(x;y) 

m 

= ^I{x\y 3 | y 1 ,...,y j - 1 ) 

3=1 
m 

= h{AjX + w' j \y 1 ,..., yj-!) - h{w' j ) 
<Y, h ^ 3 x + w'j)-h{w'j). 
Thus it suffices to show h(AjX + w'j) — h(w'j) = 0(log(l + ^ logn)) for all j. 
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Note that A, depends only on yi, . . . , yj-i, so it is independent of w'j . Thus 

E[(A jX + w^) 2 ] = E[(A jX ) 2 } + E[( w ;-) 2 ] < 0(^^) + a- 
by Equation (1101) , Because the maximum entropy distribution under an £2 constraint is a Gaussian, we have 

h(A jX + w'A - h{w' t ) < h(N(0, 0(^i^) + a-)) - ft (MO, a-)) 
j j n n n 

2 a 

= 0(log(l + -logn)). 

a 

as desired. □ 
Theorem l6. 1 I follows from Lemma 1631 and Lemma l6~4l with a = 8(1). 

7 Efficient Constructions of Window Functions 

Claim 7.1. Let cdf denote the standard Gaussian cumulative distribution function. Then: 

1. cdf (t) = 1 - cdi(-t). 

2. cdf(f) < e~ t2 / 2 fort < 0. 



3. cdf(t) < 5 fort < -v/21og(l/(5). 

4 - JL-oc cM(x)dx < 5 fort < -V21og(3/(S). 



cdf — cdf^ 



< (5. 



5. For any 5, there exists a function cdf^(t) computable in 0(log(l/<5)) time such that 
Proof. 

1 . Follows from the symmetry of Gaussian distribution. 

2. Follows from a standard moment generating function bound on Gaussian random variables. 

3. Follows from (2). 

4. Property (2) implies that cdf (t) is at most v / 2~7r < 3 times larger than the Gaussian pdf. Then apply (3) 



5. By (1) and (3), cdf(i) can be computed as ±<5 or 1 ± 5 unless \t\ < -\/2(log(l/<5)). But then an efficient 
expansion around only requires (3(log(l/<5)) terms to achieve precision ±5. 

For example, we can truncate the representation llMar04l 

Jr/ . 1 e-* 2 / 2 ( t 3 t 5 t 7 
cdf (t) = - + -=- [t + - 



2 V2^ V 3 3-5 3-5-7 
at 0(log(l/(5)) terms. 

□ 
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Claim 7.2. Define the continuous Fourier transform of f(t) by 



f(s) = / e^ ist f(t)dt. 

J — OO 

For t £ [n], define 

OO 

gt = Vn E /( f + n 'j) 

and 

OO 

9t= E /(*/* + J")- 

J=— OO 

77ie« g = g', where g is the n-dimensional DFT of g. 

Proof. Let Ai(i) denote the Dirac comb of period 1: Ai(t) is a Dirac delta function when t is an integer and 
zero elsewhere. Then Ai = Ai. For any t € [n], we have 

n oo 

3*=E E f(s + nj)e-™^ 

s—1 j — — OO 

n oo 

= E E /(s + «j)e" 2 " it(s+nj)/ " 

s—1 j — — OO 

OO 

= E f(s)e~ 27Tits/n 

X) 

/(s)A 1 (s)e- 27ri * s /"d S 



= (/-A x )(t/n) 
= (7*Ax)(t/n) 

OO 

= E /(*/» + i) 



□ 



Lemma 7.3. For any parameters B > 1, <5 > 0, a«c/ a > 0, f/iere exist flat window functions G and G' such 
that G can be computed in 0(— log(n/<5)) f/me, and for each i G'i can be evaluated in 0(\og(n/S)) time. 

Proof. We will show this for a function G' that is a Gaussian convolved with a box-car filter. First we 
construct analogous window functions for the continuous Fourier transform. We then show that discretizing 
these functions gives the desired result. 

Let D be the pdf of a Gaussian with standard deviation a > 1 to be determined later, so D is the pdf of 
a Gaussian with standard deviation 1/er. Let F be a box-car filter of length 2C for some parameter C < 1; 
that is, let F(t) = 1 for \t\ < C and F(t) = otherwise, so F(t) = 2Csinc(i/(2C)). Let G* = D ■ F, so 
G* = D * F. 

Then \G*(t)\ < 2C\D(t)\ < 2C5 for \t\ > a^/2\og{\/5). Furthermore, G* is computable in 0(1) 
time. 

Its Fourier transform is G*(t) = cd£(a(t + C)) - cdf(a(i - C)). By Claim Owe have for \t\ > 
C + y/2log{l/8)/a that G*(t) = ±6. We also have, for |i| < C - ^2 \og(l/S)/a, that G*(t) = l± 25. 
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Now, for i G [n] let H % = v^E^o G*(i + nj). By Claim E2 it has DFT Hi = Y^Loo G*{i/n + j) 



Furthermore, 



j=oo 



1*1X7-^2 log(l/5) 



< 4C 



•i<- ( r^/21og(l/5) 
— CT V 21 °g( 1 /'5) 



|D(x)| + D(-ay/21og(l/6)) 



Thus if we let 



< 4C(cdf(-V21og(l/<5)) + D{-ayf 2\og{l/ 5))) 

< 8G<5 < 85. 



|i|<cr N / 21 °g( 1 /'5) 
j=i (mod n) 



for |i| < (Ty/2\og(l/5) and G; = otherwise, then \\G - H\\ 1 < 85y/n. 
Now, note that for integer i with \i\ < n/2, 

H i -G*(i/n)=J2G*(i/n + j) 



Hi-G*(i/n)| <2£G*(-l/2-j) 

OO 

< 2£cdf(<7(-l/2- j + G)) 

1/2 

cdf(a(.T + C))dx + 2cdf(<r(-l/2 + C*)) 



by Claim r7TTI as long as 



< 2 

< 25/a + 25 < 45 



(7(1/2 - C) > v/21og(3/5). 



Let 



(11) 



1 |i| < n(G- N /21og(l/J) /(r) 

_ 0_ |i| >n(G + V21og(l/<5)/a) 

cdf 5(cr(i + C)/n) — cdfa(cr(i — C)/n) otherwise 

where cdfs(i) computes cdf(i) to precision ±5 in 0(log(l/5)) time, as per Claim ITTI Then G'i 
G*(i/n) ±25 = Hi± 65. Hence 



G-G 7 


< 

OO 


G 7 


- H 


+ 

OO 


G- 


H 


30 




< 


G 7 


- H 


+ 

OO 


G - 


H 


2 






G 7 


- H 


+ 1 

OO 


G - 


H\\ 2 




< 


G 7 


- H 


+ 1 

OO 


G- 


H\\i 



< (8^ + 6)6. 
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Replacing S by S/n and plugging in a = — \og(n/6) > 1 and C = (1 — a/2) / (2B) < 1, we have the 
required properties of flat window functions: 

• = for \i\ > fi(f \og(n/S)) 

• G'i = 1 for | i\ < (1 - a)n/(2B) 

• G 7 ; = for |*| > n/(2B) 

• G'i € [0, 1] for all i. 



G'-G 



< 5. 



We can compute G over its entire support in \og(n/5)) total time. 



• For any i, G'i can be computed in 0(log(n/<5)) time for \i\ 6 [(1 — a)n/{2B) 1 n/{2B)] and 0(1) time 
otherwise. 

The only requirement was Equation dTTb . which is that 

— V / 21og(n/ ( 5)(l/2 - > V21og(3n/5). 

This holds if £? > 2. The _B = 1 case is trivial using the constant function G'i = 1. □ 

8 Open questions 

• Design an 0(k log n)-time algorithm for general signals. Alternatively, prove that no such algorithm exists, 
under "reasonable" assumptions^ 

• Reduce the sample complexity of the algorithms. Currently, the number of samples used by each algorithm 
is only bounded by their running times. 

• Extend the results to other (related) tasks, such as computing the sparse Walsh-Hadamard Transform. 

• Extend the algorithm to the case when n is not a power of 2. Note that some of the earlier algorithms, 
e.g., IIGMS05I . work for any n. 

• Improve the failure probability of the algorithms. Currently, the algorithms only succeed with constant 
probability. Straightforward amplification would take a log(l/p) factor slowdown to succeed with 1 — p 
probability. One would hope to avoid this slowdown. 
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10 The Q(k log(n/fc)/ log log n) lower bound for the sample complexity shows that the running time of our algorithm, 
0(k log n log(n/fc)), is equal to the sample complexity of the problem times (roughly) logn. One could speculate that this loga- 
rithmic discrepancy is due to the need for using FFT to process the samples. Although we do not have any evidence for the optimality 
of our general algorithm, the "sample complexity times log n" bound appears to be a natural barrier to further improvements. 
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